Gamma distribution#
The Gamma distribution is a flexible continuous distribution on positive real numbers. It’s a natural model for waiting times, lifetimes, and other positive, right-skewed quantities.
1) Title & classification#
Item |
Value |
|---|---|
Name |
Gamma ( |
Type |
Continuous |
Support |
\(x \in (0,\infty)\) |
Parameters (common) |
shape \(\alpha>0\), scale \(\theta>0\) |
Alternative |
shape \(\alpha>0\), rate \(\beta = 1/\theta > 0\) |
What you’ll be able to do after this notebook#
recognize when a Gamma model makes sense (and when it doesn’t)
write the PDF/CDF and compute key moments
interpret how \((\alpha, \theta)\) change the shape
derive the mean/variance and the log-likelihood
sample from a Gamma using NumPy only (Marsaglia–Tsang)
use
scipy.stats.gammafor inference and simulation
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from scipy import stats
from scipy.special import gammainc, gammaln, psi
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)
rng = np.random.default_rng(42)
2) Intuition & motivation#
What it models#
A classic story:
Events arrive according to a Poisson process with rate \(\lambda\).
The waiting time between events is exponential.
The waiting time until the \(k\)-th event is Gamma.
Even when that exact Poisson-process story isn’t literally true, Gamma is often a good model for positive quantities with right skew.
Typical real-world use cases#
Reliability / survival: time-to-failure for components under certain assumptions
Queueing / service times: durations that are sums of smaller random delays
Insurance / finance: positive claim sizes, severities, waiting times
Hydrology / weather: rainfall amounts over fixed intervals
Bayesian stats: priors for rates (Poisson/exponential) and precisions
Relations to other distributions#
Exponential: \(\text{Gamma}(\alpha=1,\theta)\)
Erlang (integer shape): \(\alpha=k\in\{1,2,\dots\}\); sum of \(k\) exponentials
Chi-square: if \(X\sim\chi^2_\nu\), then \(X \sim \text{Gamma}(\alpha=\nu/2,\theta=2)\)
Additivity: if \(X_i\sim\text{Gamma}(\alpha_i,\theta)\) i.i.d. with shared scale \(\theta\), then \(\sum_i X_i\sim\text{Gamma}(\sum_i\alpha_i,\theta)\)
Normal approximation: for large \(\alpha\), Gamma becomes close to normal (by CLT-like behavior)
A practical note: different fields use scale \(\theta\) or rate \(\beta=1/\theta\). Always check which one a library/API expects.
3) Formal definition#
We use the shape–scale parameterization: \(\alpha>0\) (shape), \(\theta>0\) (scale).
Gamma function#
PDF#
For \(x>0\): $\( f(x\mid\alpha,\theta) = \frac{1}{\Gamma(\alpha)\,\theta^{\alpha}}\,x^{\alpha-1} e^{-x/\theta} \quad (x>0) \)$
and \(f(x)=0\) for \(x\le 0\).
CDF#
The CDF can be written using the lower incomplete gamma function. A convenient standard form uses the regularized lower incomplete gamma: $\( F(x\mid\alpha,\theta) = P\left(\alpha, \frac{x}{\theta}\right) = \frac{\gamma\left(\alpha, x/\theta\right)}{\Gamma(\alpha)} \)\( where \)\gamma(\alpha, z)=\int_0^z u^{\alpha-1}e^{-u},du$.
def gamma_logpdf(x, alpha, theta):
"""Log-PDF of Gamma(alpha, theta) with support (0, inf)."""
x = np.asarray(x, dtype=float)
alpha = float(alpha)
theta = float(theta)
if alpha <= 0 or theta <= 0:
raise ValueError("alpha and theta must be > 0")
logpdf = np.full_like(x, -np.inf, dtype=float)
mask = x > 0
logpdf[mask] = (
(alpha - 1) * np.log(x[mask]) - x[mask] / theta - gammaln(alpha) - alpha * np.log(theta)
)
return logpdf
def gamma_pdf(x, alpha, theta):
return np.exp(gamma_logpdf(x, alpha, theta))
def gamma_cdf(x, alpha, theta):
"""CDF via the regularized lower incomplete gamma P(alpha, x/theta)."""
x = np.asarray(x, dtype=float)
alpha = float(alpha)
theta = float(theta)
if alpha <= 0 or theta <= 0:
raise ValueError("alpha and theta must be > 0")
cdf = np.zeros_like(x, dtype=float)
mask = x > 0
cdf[mask] = gammainc(alpha, x[mask] / theta)
return cdf
def gamma_entropy(alpha, theta):
"""Differential entropy of Gamma(alpha, theta)."""
alpha = float(alpha)
theta = float(theta)
if alpha <= 0 or theta <= 0:
raise ValueError("alpha and theta must be > 0")
return alpha + np.log(theta) + gammaln(alpha) + (1 - alpha) * psi(alpha)
def gamma_mgf(t, alpha, theta):
"""MGF M_X(t) for t < 1/theta; diverges for t >= 1/theta."""
t = np.asarray(t, dtype=float)
alpha = float(alpha)
theta = float(theta)
if alpha <= 0 or theta <= 0:
raise ValueError("alpha and theta must be > 0")
out = np.full_like(t, np.inf, dtype=float)
mask = t < 1 / theta
out[mask] = (1 - theta * t[mask]) ** (-alpha)
return out
def gamma_cf(t, alpha, theta):
"""Characteristic function phi_X(t)."""
t = np.asarray(t, dtype=float)
alpha = float(alpha)
theta = float(theta)
if alpha <= 0 or theta <= 0:
raise ValueError("alpha and theta must be > 0")
return (1 - 1j * theta * t) ** (-alpha)
4) Moments & properties#
For \(X\sim\text{Gamma}(\alpha,\theta)\) (shape–scale):
Quantity |
Value |
|---|---|
Mean |
\(\mathbb{E}[X]=\alpha\theta\) |
Variance |
\(\mathrm{Var}(X)=\alpha\theta^2\) |
Skewness |
\(\gamma_1 = \frac{2}{\sqrt{\alpha}}\) |
Excess kurtosis |
\(\gamma_2 = \frac{6}{\alpha}\) (so kurtosis \(=3+6/\alpha\)) |
Mode |
\((\alpha-1)\theta\) for \(\alpha\ge 1\) (otherwise at 0+) |
MGF |
\(M_X(t)=(1-\theta t)^{-\alpha}\) for \(t<1/\theta\) |
CF |
\(\varphi_X(t)=(1-i\theta t)^{-\alpha}\) |
Entropy |
\(H = \alpha + \log\theta + \log\Gamma(\alpha) + (1-\alpha)\psi(\alpha)\) |
Key closure properties:
Scaling: if \(X\sim\text{Gamma}(\alpha,\theta)\) and \(c>0\), then \(cX\sim\text{Gamma}(\alpha,c\theta)\).
Additivity (shared scale): if \(X_i\sim\text{Gamma}(\alpha_i,\theta)\) independent, then \(\sum_i X_i\sim\text{Gamma}(\sum_i\alpha_i,\theta)\).
alpha, theta = 2.5, 1.3
mean_theory = alpha * theta
var_theory = alpha * theta**2
skew_theory = 2 / np.sqrt(alpha)
excess_kurt_theory = 6 / alpha
entropy_theory = gamma_entropy(alpha, theta)
mean_theory, var_theory, skew_theory, excess_kurt_theory, entropy_theory
(3.25, 4.2250000000000005, 1.2649110640673518, 2.4, 1.9923121739725458)
5) Parameter interpretation#
Shape \(\alpha\)#
Controls how concentrated the mass is and how quickly the density rises from 0.
If \(\alpha=1\), the Gamma becomes exponential (memoryless).
If \(0<\alpha<1\), the density has a singularity at 0 (it blows up as \(x\to 0^+\)) and the distribution is highly right-skewed.
As \(\alpha\) increases, the distribution becomes more symmetric and eventually looks close to normal.
Scale \(\theta\) (or rate \(\beta=1/\theta\))#
Scale stretches/compresses the distribution horizontally.
Mean and standard deviation scale linearly with \(\theta\):
\(\mathbb{E}[X]=\alpha\theta\)
\(\mathrm{SD}(X)=\sqrt{\alpha}\,\theta\)
A good mental model: \(\alpha\) affects the shape, \(\theta\) affects the units / scale.
# PDF: changing shape alpha (keep theta fixed)
theta_fixed = 1.0
alphas = [0.5, 1.0, 2.0, 5.0]
x = np.linspace(1e-6, 12, 600)
fig = go.Figure()
for a in alphas:
fig.add_trace(
go.Scatter(
x=x,
y=gamma_pdf(x, a, theta_fixed),
mode="lines",
name=f"α={a:g}, θ={theta_fixed:g}",
)
)
fig.update_layout(
title="Gamma PDF: effect of the shape α (scale θ=1)",
xaxis_title="x",
yaxis_title="pdf",
)
fig.show()
# PDF: changing scale theta (keep alpha fixed)
alpha_fixed = 2.0
thetas = [0.5, 1.0, 2.0]
x = np.linspace(1e-6, 20, 700)
fig = go.Figure()
for th in thetas:
fig.add_trace(
go.Scatter(
x=x,
y=gamma_pdf(x, alpha_fixed, th),
mode="lines",
name=f"α={alpha_fixed:g}, θ={th:g}",
)
)
fig.add_vline(
x=alpha_fixed * th,
line_dash="dot",
annotation_text=f"mean={alpha_fixed * th:.2f}",
annotation_position="top",
)
fig.update_layout(
title="Gamma PDF: effect of the scale θ (shape α=2)",
xaxis_title="x",
yaxis_title="pdf",
)
fig.show()
6) Derivations#
Expectation#
Start from the definition (for \(X\sim\text{Gamma}(\alpha,\theta)\)):
Substitute \(u=x/\theta\) (so \(x=\theta u\), \(dx=\theta du\)):
So \(\mathbb{E}[X]=\alpha\theta\).
Variance#
Similarly,
Then: $\( \mathrm{Var}(X)=\mathbb{E}[X^2] - \mathbb{E}[X]^2 = \theta^2\alpha(\alpha+1) - (\alpha\theta)^2 = \alpha\theta^2. \)$
Likelihood (i.i.d. sample)#
Let \(x_1,\dots,x_n\) be i.i.d. from \(\text{Gamma}(\alpha,\theta)\) with \(x_i>0\).
The log-likelihood is: $\( \ell(\alpha,\theta) = \sum_{i=1}^n \log f(x_i\mid\alpha,\theta) = (\alpha-1)\sum_i \log x_i - \frac{1}{\theta}\sum_i x_i - n\big(\log\Gamma(\alpha) + \alpha\log\theta\big). \)$
MLE for \(\theta\) given \(\alpha\) comes from setting \(\partial\ell/\partial\theta = 0\): $\( \hat{\theta}(\alpha) = \frac{\bar{x}}{\alpha}. \)$
Solving for \(\hat{\alpha}\) jointly requires a numerical root find; one common equation is: $\( \log\hat{\alpha} - \psi(\hat{\alpha}) = \log\bar{x} - \overline{\log x}. \)$
def gamma_loglik(x, alpha, theta):
x = np.asarray(x, dtype=float)
alpha = float(alpha)
theta = float(theta)
if alpha <= 0 or theta <= 0:
return -np.inf
if np.any(x <= 0):
return -np.inf
n = x.size
return (
(alpha - 1) * np.sum(np.log(x))
- np.sum(x) / theta
- n * (gammaln(alpha) + alpha * np.log(theta))
)
# Quick sanity check: theta-hat given alpha
alpha_true, theta_true = 3.0, 1.5
x = stats.gamma(a=alpha_true, scale=theta_true).rvs(size=2000, random_state=rng)
theta_hat_if_alpha_known = x.mean() / alpha_true
theta_hat_if_alpha_known, theta_true
(1.5106385112629404, 1.5)
7) Sampling & simulation (NumPy-only)#
Two ideas#
Integer shape (\(\alpha=k\in\mathbb{N}\), Erlang):
General shape (\(\alpha>0\)): use the Marsaglia–Tsang rejection sampler (fast and widely used).
Marsaglia–Tsang (2000) sketch#
For \(\alpha\ge 1\):
Set \(d = \alpha - 1/3\), \(c = 1/\sqrt{9d}\).
Repeat:
draw \(Z\sim\mathcal{N}(0,1)\) and set \(V=(1+cZ)^3\).
draw \(U\sim\text{Uniform}(0,1)\).
accept \(X=dV\) if a (cheap) squeeze test passes, otherwise accept using the log test.
For \(0<\alpha<1\) we use a reduction:
If \(Y\sim\text{Gamma}(\alpha+1,\theta)\) and \(U\sim\text{Uniform}(0,1)\) independent, then \(X = Y\,U^{1/\alpha} \sim \text{Gamma}(\alpha,\theta)\).
Below is a reference implementation using NumPy only (no scipy.stats.gamma.rvs).
def _gamma_rvs_mt_shape_ge_1(alpha, n, rng):
"""Marsaglia–Tsang sampler for Gamma(alpha, 1) with alpha >= 1."""
d = alpha - 1.0 / 3.0
c = 1.0 / np.sqrt(9.0 * d)
out = np.empty(n, dtype=float)
filled = 0
while filled < n:
m = n - filled
z = rng.normal(size=m)
v = (1.0 + c * z) ** 3
valid = v > 0
if not np.any(valid):
continue
z = z[valid]
v = v[valid]
u = rng.random(size=v.size)
# Squeeze step + main acceptance test
accept = (u < 1.0 - 0.0331 * (z**4)) | (
np.log(u) < 0.5 * z**2 + d * (1.0 - v + np.log(v))
)
accepted = d * v[accept]
k = accepted.size
out[filled : filled + k] = accepted
filled += k
return out
def gamma_rvs_numpy(alpha, theta=1.0, size=1, rng=None):
"""Sample from Gamma(alpha, theta) using NumPy only.
Parameters
----------
alpha : float
Shape parameter (> 0).
theta : float
Scale parameter (> 0).
size : int or tuple
Output shape.
rng : np.random.Generator
Random number generator.
"""
if rng is None:
rng = np.random.default_rng()
alpha = float(alpha)
theta = float(theta)
if alpha <= 0 or theta <= 0:
raise ValueError("alpha and theta must be > 0")
size_tuple = (size,) if isinstance(size, int) else tuple(size)
n = int(np.prod(size_tuple))
if alpha >= 1:
x = _gamma_rvs_mt_shape_ge_1(alpha, n, rng)
else:
# Boost shape to alpha+1 >= 1, then apply the U^(1/alpha) correction
y = _gamma_rvs_mt_shape_ge_1(alpha + 1.0, n, rng)
u = rng.random(size=n)
x = y * (u ** (1.0 / alpha))
return (x * theta).reshape(size_tuple)
# Smoke test: mean/variance roughly match theory
alpha_test, theta_test = 2.5, 1.3
s = gamma_rvs_numpy(alpha_test, theta_test, size=50_000, rng=rng)
s.mean(), (alpha_test * theta_test), s.var(), (alpha_test * theta_test**2)
(3.2476485440524825, 3.25, 4.233222259786438, 4.2250000000000005)
8) Visualization#
We’ll visualize:
the PDF and how it compares to a Monte Carlo histogram
the CDF and how it compares to an empirical CDF
We’ll use the NumPy-only sampler from above.
def ecdf(samples):
x = np.sort(np.asarray(samples))
y = np.arange(1, x.size + 1) / x.size
return x, y
alpha_viz, theta_viz = 2.5, 1.3
n_viz = 80_000
samples = gamma_rvs_numpy(alpha_viz, theta_viz, size=n_viz, rng=rng)
x_max = float(np.quantile(samples, 0.995))
x_grid = np.linspace(1e-6, x_max, 600)
# PDF + histogram
fig = go.Figure()
fig.add_trace(
go.Histogram(
x=samples,
nbinsx=70,
histnorm="probability density",
name="Monte Carlo samples",
opacity=0.55,
)
)
fig.add_trace(
go.Scatter(
x=x_grid,
y=gamma_pdf(x_grid, alpha_viz, theta_viz),
mode="lines",
name="Theoretical PDF",
line=dict(width=3),
)
)
fig.update_layout(
title=f"Gamma(α={alpha_viz:g}, θ={theta_viz:g}): histogram vs PDF",
xaxis_title="x",
yaxis_title="density",
bargap=0.02,
)
fig.show()
# CDF + empirical CDF
xs, ys = ecdf(samples)
fig = go.Figure()
fig.add_trace(
go.Scatter(
x=x_grid,
y=gamma_cdf(x_grid, alpha_viz, theta_viz),
mode="lines",
name="Theoretical CDF",
line=dict(width=3),
)
)
fig.add_trace(
go.Scatter(
x=xs[::100],
y=ys[::100],
mode="markers",
name="Empirical CDF (subsampled)",
marker=dict(size=5),
)
)
fig.update_layout(
title=f"Gamma(α={alpha_viz:g}, θ={theta_viz:g}): empirical CDF vs CDF",
xaxis_title="x",
yaxis_title="CDF",
)
fig.show()
# Monte Carlo moment check
sample_mean = samples.mean()
sample_var = samples.var()
theory_mean = alpha_viz * theta_viz
theory_var = alpha_viz * theta_viz**2
sample_mean, theory_mean, sample_var, theory_var
(3.2598087590265608, 3.25, 4.286106165978027, 4.2250000000000005)
9) SciPy integration (scipy.stats.gamma)#
SciPy’s Gamma distribution is scipy.stats.gamma.
Shape is passed as
a.Scale is passed as
scale.There is also a
locparameter (a shift). For the standard Gamma distribution, useloc=0.
So the mapping is:
If you’re using a rate \(\beta\) instead: scale = 1 / beta.
alpha_true, theta_true = 3.0, 1.5
dist = stats.gamma(a=alpha_true, loc=0, scale=theta_true)
x = np.linspace(0, 15, 500)
pdf = dist.pdf(x)
cdf = dist.cdf(x)
samples_scipy = dist.rvs(size=5000, random_state=rng)
# MLE fit (note: constrain loc=0 to match the usual Gamma support)
a_hat, loc_hat, scale_hat = stats.gamma.fit(samples_scipy, floc=0)
a_hat, loc_hat, scale_hat
(2.970249528831039, 0, 1.5243399321816582)
10) Statistical use cases#
A) Hypothesis testing#
Goodness-of-fit: do Gamma parameters explain the data?
Model comparison: e.g. exponential vs general Gamma (is \(\alpha=1\) plausible?)
B) Bayesian modeling#
Gamma is a standard conjugate prior for rates:
Poisson likelihood (counts) with unknown rate
Exponential likelihood (waiting times) with unknown rate
C) Generative modeling / hierarchical models#
Gamma is often used as a latent positive variable (rates, precisions, intensities). A classic example is a Poisson–Gamma mixture, which creates overdispersed count data (marginally negative binomial).
# A) Hypothesis testing example: Exponential (alpha=1) vs Gamma (alpha free)
# Generate data from a Gamma alternative
alpha_data, theta_data = 2.0, 2.0
data = stats.gamma(a=alpha_data, scale=theta_data).rvs(size=2500, random_state=rng)
# Fit the unrestricted Gamma
a_hat, loc_hat, scale_hat = stats.gamma.fit(data, floc=0)
# Null model: alpha = 1 (exponential). MLE for theta is just the sample mean.
theta0_hat = data.mean()
ll_null = gamma_loglik(data, alpha=1.0, theta=theta0_hat)
ll_alt = gamma_loglik(data, alpha=a_hat, theta=scale_hat)
lr_stat = 2 * (ll_alt - ll_null)
p_value_lrt = stats.chi2.sf(lr_stat, df=1)
# Goodness-of-fit (KS) for the fitted Gamma
D_ks, p_value_ks = stats.kstest(data, "gamma", args=(a_hat, 0, scale_hat))
lr_stat, p_value_lrt, D_ks, p_value_ks
(601.8915919589344,
6.491780106731331e-133,
0.01017051215045639,
0.9558863897485597)
# B) Bayesian modeling: Gamma prior for a Poisson rate
# Use rate parameterization for the prior/posterior: Gamma(α, β) with mean α/β.
alpha0, beta0 = 2.0, 1.0
lambda_true = 3.0
n = 25
y = rng.poisson(lam=lambda_true, size=n)
alpha_post = alpha0 + y.sum()
beta_post = beta0 + n
# Convert rate β to scale θ=1/β for our Gamma(alpha, theta) helper functions
lam_grid = np.linspace(1e-6, max(10, 3 * lambda_true), 600)
prior_pdf = gamma_pdf(lam_grid, alpha0, theta=1 / beta0)
post_pdf = gamma_pdf(lam_grid, alpha_post, theta=1 / beta_post)
fig = go.Figure()
fig.add_trace(
go.Scatter(
x=lam_grid,
y=prior_pdf,
mode="lines",
name=f"prior Γ(α={alpha0:g}, β={beta0:g})",
line=dict(width=3),
)
)
fig.add_trace(
go.Scatter(
x=lam_grid,
y=post_pdf,
mode="lines",
name=f"posterior Γ(α={alpha_post:g}, β={beta_post:g})",
line=dict(width=3),
)
)
fig.add_vline(x=lambda_true, line_dash="dash", line_color="black", annotation_text="true λ")
fig.update_layout(
title="Gamma–Poisson conjugacy: prior → posterior on λ",
xaxis_title="λ",
yaxis_title="density",
)
fig.show()
y.sum(), y.mean(), (alpha_post / beta_post)
(68, 2.72, 2.6923076923076925)
# C) Generative modeling: Poisson–Gamma mixture (overdispersed counts)
alpha_latent, theta_latent = 5.0, 0.6 # latent lambda ~ Gamma(alpha, theta)
n_obs = 20_000
lambdas = gamma_rvs_numpy(alpha_latent, theta_latent, size=n_obs, rng=rng)
counts_mixture = rng.poisson(lam=lambdas)
mean_counts = counts_mixture.mean()
counts_poisson = rng.poisson(lam=mean_counts, size=n_obs)
max_k = int(np.quantile(counts_mixture, 0.995))
bin_spec = dict(start=-0.5, end=max_k + 0.5, size=1)
fig = go.Figure()
fig.add_trace(
go.Histogram(
x=counts_mixture,
xbins=bin_spec,
histnorm="probability",
name="Poisson–Gamma mixture",
opacity=0.6,
)
)
fig.add_trace(
go.Histogram(
x=counts_poisson,
xbins=bin_spec,
histnorm="probability",
name="Poisson (same mean)",
opacity=0.6,
)
)
fig.update_layout(
title="Overdispersion from a Poisson–Gamma mixture",
xaxis_title="count",
yaxis_title="probability",
barmode="overlay",
)
fig.show()
counts_mixture.mean(), counts_mixture.var(), counts_poisson.var()
(2.98515, 4.7931294775, 2.9583533975000003)
11) Pitfalls#
Parameterization confusion
Some sources use rate \(\beta\) (larger \(\beta\) means smaller values), others use scale \(\theta=1/\beta\).
scipy.stats.gammausesscale(not rate).
locin SciPy
scipy.statsdistributions often include a shiftlocby default.For the standard Gamma with support \((0,\infty)\), you typically want
loc=0.
Behavior near zero
If \(0<\alpha<1\), the PDF diverges as \(x\to 0^+\). Plots should avoid including exactly \(x=0\).
Numerical stability
Computing the PDF directly can underflow/overflow for extreme parameters.
Prefer
logpdf(andgammaln) when doing inference or optimization.
Invalid parameters / data
Gamma requires \(\alpha>0\), \(\theta>0\), and data \(x_i>0\).
If your data include zeros or negatives, consider a shifted model (
loc), a hurdle model, or a different distribution.
12) Summary#
Gamma is a continuous distribution on \((0,\infty)\) with parameters shape \(\alpha\) and scale \(\theta\).
It naturally models waiting times and sums of exponentials, and it’s a workhorse for positive skewed data.
Key formulas: \(\mathbb{E}[X]=\alpha\theta\), \(\mathrm{Var}(X)=\alpha\theta^2\), \(M_X(t)=(1-\theta t)^{-\alpha}\).
Sampling can be done efficiently with Marsaglia–Tsang; SciPy provides
stats.gammaforpdf,cdf,rvs, andfit.Always watch for parameterization (scale vs rate),
loc, and numerical stability near \(x=0\) / extreme parameters.
References#
Marsaglia, G. and Tsang, W. W. (2000). A Simple Method for Generating Gamma Variables.
SciPy documentation:
scipy.stats.gammaAny standard mathematical statistics text (Gamma function, incomplete gamma, conjugacy)